22 research outputs found

    Privacy, Space and Time: a Survey on Privacy-Preserving Continuous Data Publishing

    Get PDF
    Sensors, portable devices, and location-based services, generate massive amounts of geo-tagged, and/or location- and user-related data on a daily basis. The manipulation of such data is useful in numerous application domains, e.g., healthcare, intelligent buildings, and traffic monitoring, to name a few. A high percentage of these data carry information of users\u27 activities and other personal details, and thus their manipulation and sharing arise concerns about the privacy of the individuals involved. To enable the secure—from the users\u27 privacy perspective—data sharing, researchers have already proposed various seminal techniques for the protection of users\u27 privacy. However, the continuous fashion in which data are generated nowadays, and the high availability of external sources of information, pose more threats and add extra challenges to the problem. In this survey, we visit the works done on data privacy for continuous data publishing, and report on the proposed solutions, with a special focus on solutions concerning location or geo-referenced data

    EFQ: Why-Not Answer Polynomials in Action

    Get PDF
    International audienceOne important issue in modern database applications is supporting the user with efficient tools to debug and fix queries because such tasks are both time and skill demanding. One particular problem is known as Why-Not question and focusses on the reasons for missing tuples from query results. The EFQ platform demonstrated here has been designed in this context to efficiently leverage Why-Not Answers polynomials, a novel approach that provides the user with complete explanations to Why-Not questions and allows for automatic, relevant query refinements

    Semi-automatic SQL Debugging and Fixing to solve the Missing-Answers Problem

    Get PDF
    International audienceAsking questions is the driving force for scientific progress. But as important as it is to ask questions, so important is to be able to understand the obtained answers. In this way, we are able to verify the sanity of the question itself and if necessary refine it. In the same spirit, in this PhD we focus on SQL queries over re-lational databases with the aim to (1) pinpoint the reasons on the SQL query that led to missing-answers (tuples that were expected but not obtained in the query result), (2) fix the SQL query so that it best fits the user's expectations, and (3) do both efficiently so as to be of practical use

    Immutably Answering Why-Not Questions for Equivalent Conjunctive Queries

    No full text
    International audienceAnswering Why-Not questions consists in explaining to developers of complex data transformations or manipulations why their data transformation did not produce some specific results, although they expected them to do so. Different types of explanations that serve as Why-Not answers have been proposed in the past and are either based on the available data, the query tree, or both. Solutions (partially) based on the query tree are generally more efficient and easier to interpret by developers than solutions solely based on data. However, algorithms producing such query-based explanations so far may return different results for reordered conjunctive query trees, and even worse, these results may be incomplete. Clearly, this represents a significant usability problem, as the explanations developers get may be partial and developers have to worry about the query tree representation of their query, losing the advantage of using a declarative query language. As remedy to this problem, we propose the Ted algorithm that produces the same complete query-based explanations for reordered conjunctive query trees

    Query-Based Why-Not Provenance with NedExplain

    No full text
    International audienceWith the increasing amount of available data and transformations manipulating the data, it has become essential to analyze and debug data transformations. A sub-problem of data transformation analysis is to understand why some data are not part of the result of a relational query. One possibility to explain the lack of data in a query result is to identify where in the query we lost data pertinent to the expected outcome. A first approach to this so called why-not provenance has been recently proposed, but we show that this first approach has some shortcomings. To overcome these shortcomings, we propose \ned, an algorithm to explain data missing from a query result. NedExplain computes the why-not provenance for monotone relational queries with aggregation. After providing necessary definitions, this paper contributes a detailed description of the algorithm. A comparative evaluation shows that it is both more efficient and effective than the state-of-the-art approach

    Technical Report: Adding Missing Words to Regular Expressions

    No full text
    Regular expressions (regexes) are patterns that are used in many applications to extract words or tokens from text. However, even hand-crafted regexes may fail to match all the intended words. In this paper, we propose a novel way to generalize a given regex so that it matches also a set of missing (previously non-matched) words. Our method finds an approximate match between the missing words and the regex, and adds disjunctions for the unmatched parts appropriately. We show that this method can not just improve the precision and recall of the regex, but also that it generates much shorter regexes than baselines and competitors on various datasets. This report complements our paper at the PAKDD 2018 conference. [18] Rapport technique: Ajout de mots manquants aux expressions régulières Résumé Les expressions régulières (regex) sont des modèles utilisés dans de nombreuses applications pour extraire des mots ou des parties du texte. Cependant, même les regex faites à la main ne correspondent pas toujours à l'ensemble des mots prévus. Dans cet article, nous proposons une nouvelle façon de généraliser une expression régulière donnée afin qu'elle corresponde également à un ensemble de mots manquants (précédemment non reconnus). Notre méthode trouve une correspondance approximative entre les mots manquants et l'expression regulière, et ajoute des disjonctions pour les parties non recon-nues de façon appropriée. Nous montrons que cette méthode améliore la précision et le rappel de la regex, et aussi qu'elle génère des expressions re-gulières beaucoup plus courtes que l'approche naïve et que les algorithmes concurrents sur différents jeux de données. Ce rapport complète notre article soumis à la conférence PAKDD 2018. [18

    Efficiently and Effectively Answering Why-Not Questions based on Provenance Polynomials

    Get PDF
    The problem of answering Why-Not questions consists in explaining why the result of a querydoes not contain some expected data, i.e., missing answers. To solve this problem, we resort to identifyingwhere in the query, data relevant to the missing answer were lost. Existing algorithms producing suchquery-based explanations rely on a query tree representation, potentially leading to different or partialexplanations. This significantly impairs on the effectiveness of computed explanations. Here we present aneffective, query-tree independent representation of query-based explanations, for a wide class of Why-Notquestions, based on provenance polynomials. We further describe an algorithm that efficiently computes thecomplete set of these explanations. An experimental evaluation validates our statement
    corecore